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Abstract 

In modern communication systems such as the Internet, random losses of 
information can be mitigated by oversampling the source. This is equivalent 
to expanding the source using overcomplete systems of vectors (frames), as 
opposed to the traditional basis expansions. Dependencies among the coef- 
ficients in frame expansions often allow for better performance comparing to 
bases under random losses of coefficients. We show that for any n-dimcnsional 
frame, any source can be linearly reconstructed from only ~ nlogn randomly 
chosen frame coefficients, with a small error and with high probability. Thus 
every frame expansion withstands random losses better (for worst case sources) 
than the orthogonal basis expansion, for which the nlogn bound is attained. 
The proof reduces to M.Rudelson's selection theorem on random vectors in the 
isotropic position, which is based on the non-commutative Khinchine's inequal- 
ity. 

1 Introduction 

Representation of signals using frames, which are overcomplete sets of vectors, is ad- 
vantageus over basis expansions in a variety of practical applications. Dependencies 
among the coefficients of the overcomplete representations guarantee a better stabil- 
ity in presence of noise, quantization, erasures, as well as greater freedom of design 
comparing to bases. This general paradigm is confirmed by many experiments and 
some theoretical work, see e.g. jD], jU2], jHVTj . [HKKj . [EPTTj . [BU]. [HK| and the 
bibliography contained therein. 

Of particular importance are the dependencies contained in frame expansions for 
design of communication systems. The redundancy of frames can mitigate random 
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losses of expansion coefficients that occur in packet-based communication systems 
such as the Internet. Detection and retransmission of lost packets in such systems 
takes much longer than their successful transmission. This is main source of delays 
known to all network users. Such delays are unacceptable for many applications, such 
as the real-time video. It is thus esirable for the receiver to be able to approximately 
reconstruct the information sent to him from whatever packets he receives, despite 
the loss of some packets. There should exist certain dependencies among the packets, 
otherwise the information contained in a missing packet would be irrevocably lost. 
Then, what is the best way to distribute the information among the packets so that 
each packet is equally important? Equivalently, this is the problem of the Multiple 
Description Coding (MDC) theory, where one wishes to communicate information 
over a set of parallel channels, each of which either works perfectly or not at all. 

The idea orig inated in [DKK was to use frame expansions to distribute the 
information among the packets with some dependencies. One can view this commu- 
nication scheme as follows: 
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The source information is viewed as a vector x £ M ra . This vector is represented by 
its m > n expansion coefficients with respect to some fixed frame. These coefficients 
are sent over the network in m packets, each in its own packet. Due to unpredictable 
communication losses, the user receives only a random subset of these packets, say 
k in average. The user applies the linear reconstruction to the received coefficients 
in hope that the reconstruction error would be small with graceful probability. The 
fundamental problem is 1 : 



How many random coefficients of a frame expansion does the user need 
to receive to be able to linearly reconstruct the source vector with a small 
error and with large probability? 

The work on this question, both theoretical and experimental, was initiated in 
GKK and continued in jKDGj and [CK] . see also a survey paper G2 . Both cases 
were considered: k < n, which clearly requires a statistical model of input vector x, 
and k > n. The performance of the frame representations was compared to that of 
the classical block channel-coded basis representations. 

In the present paper we look for a best bound on k which works for all frames and 
all source vectors x. Does every frame necessarily perform better than the trivial 
frame, the orthonormal basis - or, more generally, an orthonormal basis in R n each 
of whose elements is repeated s times? Communicating a source vector x with the 

1 In this paper, we neglect the quantization issues, which are treated in |GVT| and GKK ) 
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trivial frame is equivalent to sending each of the n coefficients of the orthonormal 
expansion of x precisely s times. To be able to reconstruct x, the user must receive 
each of the n coefficients at least once. This is possible with probability at least 1 — e 
only if the user receives k > C(e)n log n random coefficients in total. This gives the 
lower bound on k in the question above. Remarkably, the upper bound matches. 

Theorem 1.1 For any uniform tight frame in M. n and any source vector x, the 
linear reconstruction from k random coefficients of x yields an approximation error 
at most e with probability 1 — e, provided k > C(e)n log n. 

Here C{e) is a constant that depends only on e; this dependence is discussed in the 
next section. Tightness of a frame is assumed here only for simplicity. 

Note that the optimal bound on k does not depend on the size m of the frame, 
so there may be many lost coefficients - in fact, most of them may be lost. Hence it 
is not the number of the lost coefficients that determines the performance but the 
number k of received coefficients. 

As argued in |G2j . one advantage of frame representations over the traditional 
block channel-coded basis representations is that frames allow for a real time recon- 
struction of the source. The receiver can attempt to reconstruct a source vector - 
such as a still image or video - in real time as the packets arrive, starting from the 
very first successfully received coefficient. Within one communication session, the 
number of received coefficients k will thus grow in time from 1 to possibly m, and 
the quality of reconstruction will improve as more coefficients arrive. (In contrast to 
this, in the block channel-coded bases model the user must wait until n coefficients 
arrive). Theorem 11.11 states that, with any frame design and any source, the recon- 
struction quality will reach a nearly optimal level as soon as ~ nlogn coefficients 
are received, so one may stop the session then. 

Theorem 11.11 shows that every frame must withstand random losses better than 
the trivial frame, the one formed by repeating the elements of the orhogonal basis. Of 
course, there exist frames that perform better than the trivial frame. The problem 
of optimal desing of such frames is addressed in fGKK and |CK| . As noticed e.g. in 
GVT , a set of m = sn random points (xi) taken independently with the uniform 
distribution on the unit sphere S*" --1 forms a frame which approaches a tight frame 
with large probability, provided the redundancy s —* oo. Consequently, a random 
fe-element subset of this set also forms an almost tight frame with large probability, 
provided k > tn and t is large. Then one can linearly reconstruct any source vector 
x from using its k random coefficients with respect to the frame (xj) with probability 
1 — e, provided k > C(e)n. Hence for this frame, the logarithmic factor is not needed 
in the number received coefficients k. 

Our proof of Theorem II. II is based on a result of M.Rudelson in the asymptotic 
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convex geometry about vectors in the isotropic position |R2j . There exists a remark- 
able equivalence of the theories. All of the following classes coincide in W 1 (up to 
an appropriate rescaling), see fV) : 

• the class of tight frames, 

• the class of contact points of convex bodies, 

• the class of John's decompositions of the identity, 

• the class of vectors in the isotropic position. 

The selection theorem of M.Rudelson |R2j can thus be interpreted as a result about 
frames, which leads to Theorem 11.11 In order to obtain an exponentially large 
probability in Theorem 11.11 and because of a slightly different model of random 
selection in M.Rudelson 's theorem, we will prove the latter with some necessary 
modifications. Two proofs of Rudelson's theorem are known. The one which was 
historically the first |Rlj uses majorizing measures, a deep technique in modern 
probability theory developed by M.Talagrand (see [Tj). The other proof |R2j is 
the one we follow in the present paper. It is based on the noncommutative operator 
theory, more precisely on the noncommutative Khinchine's inequality due to F.Lust- 
Piquard and G.Pisier (see [LP], [Pij. [R2]). 

Section [2] relates the frames to the decompositions of the identity and offers a 
precise form of Theorem 11.11 Section |3] discusses the noncommutative Khinchine's 
inequality and Pisier's proof of Rudelson's lemma. In Section^we show how Rudel- 
son's lemma implies a precise form of Theorem ll.il 

2 Frames as decompositions of identity and their ran- 
dom parts 

For an introduction to frames, see [D] and [Uj- A system of vectors (xj) finite or 
infitite, in a Hilbert space, is called a frame if there exist A > and B > (the 
frame bounds) such that 



Our Hilbert space will be W 1 with its canonical scalar product. We will specialize 
to uniform frames, those for which \\xi\\ = 1 for all i, and to tight frames, for which 
A = B. The reason for considering only tight frames is the simple fact that a frame 
has frame bounds (^4, B) if and only if it is V ^41?-equivalent to some tight frame 




holds for all iel". 



4 



(see jC]). By being M-equivalent we mean that there exists a linear operator T that 
maps elements of one frame to the other with ||T||||T _1 || < M. 

We will view frame expansions as decompositions of identity. A pair of vectors 
(x,y) in M. n defines a one-dimensional linear operator x <S> y given by (x ® y){z) = 
(x,z)y. Then for any system of vectors {x^YLi with ||xj|| = 1 and for the identity 
operator id on W 1 one has 



m 

n 



{ x i)7L\ is a uniform tight frame in W 1 if and only if id = — y^Xj <S> x.- L . (2) 

m ' 
i=l 

Communication scheme based on a uniform tight frame works as follows. 

A source vector x G W 1 is represented through the expansion ©, i.e. 

m 

X — > \Xi, X)Xi, 

m 

i=l 

and the coefficients y{i) := {xi,x), i = l,...,m are sent over the network. At 
each given time during the communication session, the user has received a random 
subset a C {l,...,m} of these coefficients. The user applies to them the linear 
reconstruction, computing 

x= — ){xi,x)xi (3) 

in hope that the error \\x — x\\ would be small with large probability. The question 
is - how large should |<r| for this to hold? 

More formally, the random subset a is realized by including each element of 
{1, . . . ,m} into a independently with probability k/m, where < k < m is some 
fixed number. Then a is a random subset of {1, ... , m} of average size k. 

Theorem 2.1 Let (xi) r ^ 1 be a uniform tight frame in M. n , and e > 0. Let a be a 
random subset of {!,... , m} of average size k > C(n/e 2 ) log(n/e 2 ). Then 



id — | — 7 y Xi <g> x, 



>et\< Ce"* 2 



in the (only interesting) range < t < 1/e. 

Here and thereafter C,Ci,... denote absolute constants, whose values for conve- 
nience may be different from line to line (but they do not depend on anything). 

Theorem 12.11 gives an asymptotically optimal bound on the required number k 
of received coefficients in communication scheme (^Q): 
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Corollary 2.2 Let be a uniform tight frame in R n . Let e £ (0, 1), t > 1 and 

k > C(n/e 2 ) \og{n/e 2 ). With probability at least 1 — Ce~* 2 , the linear reconstruction 
(j3J) from a random subset a of average size k gives the error 

\\x — x\\ < et for all possible sources x S M n . 

■ 

Theorem 1 1 . 1 1 clear lv follows from Corollary 12.21 

Remark. The proof also shows that the average approximation error in Theorem l2.2l 
is small, E||x — x|| < e. 



3 Noncommutative Khinchine's inequality and Rudel- 
son's theorem 

The main ingredient in the proof of Theorem l2.1l is the following result of M.Rudelson 

E2. 



Lemma 3.1 (M.Rudelson) Let (zj) be a finite collection of vectors in~M, d . Then 



E y^gjZj ig) Zj j < C(p + logd) 1 ^ 2 max \\zj\\ ■ z% ® 



1/2 
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G.Pisier (^IJ, see |R2j ) discovered an approach to this result via the noncom- 
mutative operator theory, which greatly simplified the original proof of M.Rudelson 
|Rlj . For completeness, we give a proof of Lemma 13.11 since only case p = 1 was 
treated explicitely in the literature. 

Lemma 13. 1 1 reduces to the noncommutative Khinchine inequality due to F.Lust- 
Piquard and G.Pisier (see |LPj . |Pij . |R2j ). In the noncommutative operator theory, 
the role of scalars is played by linear operators. Beside the usual operator norm, 
an operator Z on M d has the norm in the Schatten class C d for p > 1, defined as 
follows. Let Si{Z) be the s-numbers of Z, that is the eigenvalues of Z*Z. The norm 
in the Schatten class is then H^H^d = (Yli=i • s «(^) p ) 1 ^ p - 

Theorem 3.2 (Non-commutative Khinchine's inequality [LPJ, |PiJ, see |R2j) 

Let 2 < p < oo. For any finite sequence (Zi) in C d one has 

R{(Zi))< (e|| J^SiZi \ ° Y /P <CJp-R((Zi)), 
\ II ^ c*j 
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where 



R((Z i ))=m a x(\\(Y,Z*Z.. 



iV2 



1^ 
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In the scalar case, that is for d = 1, Teorem 13.21 is the classical Khinchine's 
inequality (see e.g. |LT| Lemma 4.1). 

Proof of Lemma 13.11 Note that for every r > 1 and every operator Z E C^, 

l/r 



\Z\\ C$ = (^2 s i( z ) r ) " < d 1/r max Si (Z) 



i=i 



Let r = p + log d. Then d l l r < e, hence 

ll^ll < ll^llc d — e ll^1l- 



(4) 



We apply the noncommutative Khinchine's inequality for Zi = z^® Z{. Note that 
Z*Z t = ZiZf = Wz^Zi ® Zi . By ©, 



E llE 



P\ 1/p 



< 



IE 

j 

<^||(E 



£7 Qy 



pill Z{V9 Zi 
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& (X) % 



< Cey^max ||zj|| • || ^ 

i 

In view of our choice of r, this completes the proof of Lemma 13. II 
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4 Proof of Theorem 12.11 

Moments and tails The tail probability in Theorem 12.11 can be computed by 
estimating the moments. This is described in the following standard lemma. For an 
a > 1, the V'o-norm of a random variable Z is defined as 

\\Z\\j, a = inf {A > : Eexp|Z/A| a < e}. 

Lemma 4.1 (see )LT^ Lemmae 3.7 and 4-10) Let Z be a nonnegative random vari- 
able, and let a = d/2 for some positive integer d. The following are equivalent: 
(i) there exists a constant K > such that 

(EZ p y/ p < Kp a for all p > 2; 
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(ii) there exists a constant K > such that 

F{Z > Kt} < 2exp(-i 1/a ) for all t > 0; 
(Hi) there exists a constant K > such that 

\\Z\U a <K. 

Furthermore, the constants in (i), (ii) and (Hi) depend only on a and on each other. 
Corollary 4.2 Let Z is a nonegative random variable and p > 2. Then 

(KZ p ) 1/p < Cplog(EexpZ) 

for all p > 1 . 

Proof. Let M = \\Z\\^. Assume first that M > 1. We have 

Eexp(Z/M) = e. 

By Lemma 14. 11 (E(Z/M )p) 1 /p < Cp. Then by Jensen's inequality 

(EZPy/P < CpM = CpMlog(Eexp(Z/M)) 
= Cplog(Eexp(Z/M)) M 
< Cplog(EexpZ). 

For a general nonnegative variable Z, note that ||1 + Z\^ > 1, hence by the 
previous argument 

( EZ p)1/p < ( E (! + z) p ) l / p < Cplog(Eexp(l + Z)) = Ceplog(Eexp Z). 
This completes the proof. ■ 

Symmetrization We start our proof of Theorem 12.11 with the decomposition (|2|). 

m 



m 

n 

X = 

m 



i=l 



To realize a random subset cr, we introduce selectors (5^ )^L X , that is independent 
{0, l}-valued random variables with means E<5j = 5, where 5 = . Then a = {i : 
<5i = 1} is a random subset of {1, ... , m} of average size k. 
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Disregarding for a moment a difference between the random size |<r| and its mean 
k, thanks to Lemma 14.11 we can compute the probability estimate in Theorem 12,11 
by estimating the moments 



E 



id 



E 



Xi 09 Xi 



P\ l/p 



E 



id 



i=i 



jXi 09 Xi 



p\ l/p 



for p > 2. This will be done in several steps. 

At the first step, we apply the classical symmetrization tecnique (see |LT| 6.2). 
We look at Y = id - f Ya 

=i SiXi (8> Xi as a random variable (random operator) 
and consider its independent copy Y'. Since MY' = 0, Jensen's inequality yields 
Ellyf <E\\Y -Y'\\p, hence 



E p < (E\\j^2(5i-5i)xi®x, 



p\ i/p 



i=l 



where (S' i )^l 1 is an independent copy of (5i)'^ =1 . Let (ffj) be a sequence of independent 
symmetric {—1, l}-valued random variables, independent of both (Si) and (<5Q. Since 
5i — 5'i is a symmetric random variable, it is distributed identically to Ei(5i — 5^). By 
Minkowski's inequality, 



E p < (E 



■t=l 
m 

<2(E||fyj E( 4 



?E^; 



X?; Q9 3%; 



£=1 



)in 



X7 09 %i 



i=l 



P\ 1/P 



(5) 



Bounding the moments Let us fix a realization of the selectors (di) (hence a set 
a) and denote by E e the expectation with respect to (£i). The number of nonzero 
elements among Z{ = SiXi, i = 1, . . . ,m is d = \a\ = XX=i Consequently, we can 
view Zi as vectors in R rf . Applying Lemma 13. II to them, we obtain 



II n \ 



p\ i/p n 



I- ( Ee ll ^2 £iZi ® Zi \\ ) 



p\i/p 



i=l 



Cn 



< -^(P + bg I CT I) 1 / 2 J^fe® 



i=l 



1/2 
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By © and Cauchy-Schwartz inequality, we get 



E p < 2(EE E |~^e i fe® 

i=l 

<2C^[E( P + log\a\r] 1/2p 



i=l 



l/2p 



(6) 



The first expectation in @ is estimated by Minkowski's inequality and Corollary 14.21 



as 



[E(p + log |(t| ) p ] 1/2p < [p + (E log p H) 1 ^] 1/2 

< [p + CplogE|a|] 1/2 = [p + Cplogk} 1/2 < C{p\ogk) l/2 . 
The second expectation in © is estimated by Minkowski's inequality as 

l/2p 



i=l 



Summarizing, (JBj) becomes 



n log k 



<(1 + ^ P ) 1/2 . 



(1 + S P ). 



Denoting a = nl ° gfc and solving for £^ p , we have 
thus 

mm(Ep, 1) < Cy/ap. 
Since Ep = (J<LZ p ) l l p for Z = — ? X^jeo- x « ® we nave 

[E(min(Z, l)) p ] 1/p < min(£ p , 1) < Cy5p. 

By Corollary |4~T1 

P{min(Z, 1) > Ciy^t} < 2exp(-t 2 ) for all t > 

Now recall the restriction on k in Theorem 12. 11 k > C(n/e 2 ) log(n/e 2 ). By choosing 
C large enough, we can make 



(7) 



C 1 ^ = cJ^<e/10. 



k 



In view of the definition of Z. (|7|) implies 



(11^- T-y^^i®^ > — 1 < 2exp(-t 2 ) for all < t < 10/e. (8) 
L II k ' 10 > 
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Replacing the average size of the random set by its actual size It remains 
to replace k by \a\ in (jSJ). Indeed, since |<t| = $i ^ s a sum °f m independent 

{0, l}-valued random variables 5j with E<5j = 5 = Bernstein's inequality (see 
|Pe| ) shows that for s < 25m = 2k one has 



Prob{ | |o"| — k\ > s} < 2 exp 



s 2 \ 



85m 



1 < 2 exp 



-) 

8kJ 



Then for s = 



ProM 



(7 

T 



et -] 

> — } < 2 exp 
10 J ~ 



e 2 t 2 k 
800 



< 2exp(-t 2 ). 



If both events |^ — 1| < and — ^ }~2 iea x i® x i\\ < T5 n °ld, which happens with 
probability at least 1 — 4exp(— t 2 ), then by the triangle inequality ||| Yliea x i® x ill ^ 
1 + fg < 2, hence 



n \ - ^ . , n x - 

id — - — - } Xi®Xi < id — — y Xi<gi 



3-7 Qy 



Ida 



et Aet 

< 1 < et. 

~ 10 10 



Thus k may be replaced by |cr| in (JHJ) at the cost of replacing by et. This completes 
the proof of Theorem 12.11 ■ 
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